Main Page > Articles > William Gann > William Gann's Hyperparameter Tuning of Feature Engineering Pipelines using GridSearchCV

William Gann's Hyperparameter Tuning of Feature Engineering Pipelines using GridSearchCV

From TradingHabits, the trading encyclopedia · 5 min read · February 28, 2026
The Black Book of Day Trading Strategies
Free Book

The Black Book of Day Trading Strategies

1,000 complete strategies · 31 chapters · Full trade plans

Hyperparameter tuning is the process of finding the optimal hyperparameters for a machine learning model. In the context of feature engineering pipelines, this includes tuning the parameters of the transformers, such as the window size of a moving average or the number of components in a PCA.

Scikit-Learn's GridSearchCV provides a simple and effective way to perform hyperparameter tuning.

GridSearchCV and RandomizedSearchCV

GridSearchCV performs an exhaustive search over a specified parameter grid. RandomizedSearchCV, on the other hand, performs a randomized search over a specified parameter distribution. RandomizedSearchCV is often preferred for large parameter spaces, as it is more computationally efficient.

python
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

# Assume pipeline is our feature engineering pipeline
parameters = {
    'moving_average__short_window': [10, 20, 50],
    'moving_average__long_window': [100, 200, 300],
}

grid_search = GridSearchCV(pipeline, parameters, cv=5)
grid_search.fit(X, y)

Cross-Validation Strategies for Financial Time Series

Standard cross-validation techniques, such as k-fold cross-validation, are not suitable for financial time series data, as they can lead to data leakage. Instead, we need to use cross-validation strategies that are specifically designed for time series data, such as TimeSeriesSplit.

python
from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=5)
grid_search = GridSearchCV(pipeline, parameters, cv=tscv)

Mathematical Formulation: Cross-Validation

The goal of cross-validation is to estimate the generalization error of a model. The average of the errors over the k-folds is used as the estimate of the generalization error.

E_{cv} = rac{1}{k} \sum_{i=1}^{k} E_i

Where:

  • $E_{cv}$ is the cross-validation error.
  • $k$ is the number of folds.
  • $E_i$ is the error on the i-th fold.
ParameterValue
short_window20
long_window200
score0.75

By tuning the hyperparameters of your feature engineering pipeline, you can significantly improve the performance of your trading models.